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Abstract 

We prove an optimal Q(n) lower bound on the randomized communication complexity 
of the much-studied GAP-HAMMING-DlSTANCE problem. As a consequence, we obtain es- 
sentially optimal multi-pass space lower bounds in the data stream model for a number of 
fundamental problems, including the estimation of frequency moments. 

The GAP-HAMMING-DlSTANCE problem is a communication problem, wherein Alice and 
Bob receive n-bit strings x and y, respectively. They are promised that the Hamming distance 
between x and y is either at least n/2 + ^Jn or at most n/1 — y/n, and their goal is to decide 
which of these is the case. Since the formal presentation of the problem by Indyk and Woodruff 
(FOCS, 2003), it had been conjectured that the naive protocol, which uses n bits of communi- 
cation, is asymptotically optimal. The conjecture was shown to be true in several special cases, 
e.g., when the communication is deterministic, or when the number of rounds of communica- 
tion is limited. 

The proof of our aforementioned result, which settles this conjecture fully, is based on a new 
geometric statement regarding correlations in Gaussian space, related to a result of C. Borell 
(1985). To prove this geometric statement, we show that random projections of not-too-small 
sets in Gaussian space are close to a mixture of translated normal variables. 
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1 Introduction 



Communication complexity is a much-studied topic in computational complexity, deriving its 
importance both from the basic nature of the questions it asks and the wide range of applications 
of its results, covering, for instance, lower bounds on circuit depth (see, e.g., IKW88|0 and on query 
times for static data structures (see, e.g., |MNSW95. Pat08J). In the basic setup, which is all that 
concerns us here, each of two players, Alice and Bob, receives a binary string as input. Their goal 
is to compute some function of the two strings, using a protocol that involves exchanging a small 
number of bits. Since communication complexity is often applied as a lower bound technique, 
much of the work in the area attempts to rule out the existence of a nontrivial protocol. For many 
functions, this amounts to proving an Q(n) lower bound on the number of bits any successful 
protocol must exchange, n being the common length of Alice's and Bob's input strings. Proofs tend 
to be considerably more challenging, and more broadly applicable, when the protocol is allowed 
to be randomized and err with some small constant probability (such as 1 /3) on each input. 

For a detailed coverage of the basics of the field, as well as a number of applications, we refer 
the reader to the textbook of Kushilevitz and Nisan IIKN97L For the reader's convenience, we 
review the most basic notions in Section [2j 

In this paper, we focus specifically on the Gap-Hamming-Distance problem (henceforth abbre- 
viated as GHD), which was first formally studied by Indyk and Woodruff 1 I W03B in the context of 
proving space lower bounds for the Distinct Elements problem in the data stream model. We also 
consider some closely related variants of GHD. 

The Problem and the Main Result. In the Gap-Hamming-Distance problem GHD„ ;i „, Alice and 
Bob receive binary strings x G {0, 1}" and y G {0, l} n , respectively. They wish to decide whether 
x and y are "close" or "far " in the Hamming sense, with a certain gap separating the definitions of 
"close" and "far." Specifically, the players must output if A(x, y) < t — g and 1 if A(x,y) > t + g, 
where A denotes Hamming distance; if neither of these holds, they may output either or 1. 
Clearly, this problem becomes easier as the gap g increases. Of special interest is the case when 
t = n/2 and g = @(y/n); these parameters are natural, and as we shall show later using elemen- 
tary reductions, understanding the complexity of the problem with these parameters leads to a 
complete understanding of the problem for essentially all other gap sizes and threshold locations. 
Furthermore, applications of GHD, such as the ones considered by Indyk and Woodruff MIW03L 
need precisely this natural setting of parameters. Henceforth, we shall simply write "GHD" to 
denote GHD„ n/2 ^. 

Our main result states, simply, that this problem does not have a nontrivial protocol. Here is a 
somewhat informal statement; a fully formal version appears as Theorem l2.6l 

Theorem 1.1 (Main Theorem, Informal). If a randomized protocol solves GHD, then it must communi- 
cate a total o/Q(n) bits. 

In fact, the technique we use to prove this theorem yields the stronger result that the same 
O(n) hardness holds even if Alice and Bob are given uniformly random and independent inputs 
in {0,1}". The cleanness of this "hard distribution" is potentially important in applications. We 
state this result formally in Theorem 12 .71 

Relation to Prior Work. Theorem 11.11 is the logical conclusion of a moderately long line of re- 
search. This was begun in the aforementioned work of Indyk and Woodruff IIW03I , who showed 
a linear lower bound on the communication complexity of a somewhat artificial variant of GHD in 
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the one-way model, i.e., in the model where the communication is required to consist of just one 
message from Alice to Bob. Woodruff BWoo04M soon followed up with an Q(n) bound for GHD 
itself, still in the one-way model; the proof used rather intricate combinatorial constructions and 
computations. Jayram et al. |JKS08| later provided a rather different and much simpler proof, by a 
reduction from the INDEX problem. Their reduction was geometric, in the sense that they exploited 
a natural correspondence between Hamming space and Euclidean space; this correspondence has 
proved fruitful in further work on the problem, including this work. Recently, Woodruff RWoo09B 
and Brody and Chakrabarti HBC09II gave direct combinatorial proofs of the O(n) one-way bound. 

All of this work left open an important question: what can be said about the complexity of GHD 
when two-way communication is allowed? It has been conjectured, at least since the formalization of 
the problem in 2003, that Q(tt) is still the right answer, i.e., that GHD has no nontrivial protocol, 
irrespective of the communication pattern. 

Until 2009, our understanding of this matter was limited to two "folklore" results. Firstly, the 
deterministic communication complexity of GHD„ /tl /2,g can be shown to be O(n), even allowing 
two-way communication and a gap as large as g = cn, for a small enough constant c. This follows 
by directly demonstrating that its communication matrix contains no large monochromatic rect- 
angles (see, e.g., IWoo07l ). Secondly, a simple reduction from DISJOINTNESS to GHD„^/ 2/ g shows 
that its randomized (two-way) communication complexity is Ci(n/g); notice that the correspond- 
ing bound for GHD (where g = \fn) is Q( v / n). Meanwhile, we have an upper bound of 0(n 2 /g 2 ), 
via the simple (and one-way) protocol that samples sufficiently many coordinates of x and y to 
give the right answer with high probability. It remained a significant challenge to improve upon 
either tradeoff, even for just two rounds of communication. 

Recently, Brody and Chakrabarti [BC09J made progress on the conjecture, proving it for ran- 
domized protocols with two-way communication, but only a constant number of rounds of com- 
munication. In fact, they showed that in a /c-round protocol, at least one message must have length 
n /2°(' c2 ). They achieved this via a round elimination argument. At a high level, they showed that 
if the first message in a GHD protocol is too short, the work done by the rest of the messages 
can be used to solve a "smaller" instance of GHD, by exploiting some combinatorial properties of 
Hamming space. More recently, Brody et al. MBCR + 10| improved the bound to Q(n/ (fc 2 logA:)), 
still using a round elimination argument, but exploiting geome tric prope rties of Hamming and 
Euclidean space instead. We refer the reader to the discussion in IB CR + 10l for details, including a 
comparison of the two arguments. 

Our main theorem completes this picture, confirming the main outstanding conjecture about 
GHD. Moreover, a straightforward reduction (Prop. I4.4|) yields the more general result that the 
randomized complexity of GHD„ , n /2,g is ®(min{n, n 2 /g 2 }). Our lower bound proof is significantly 
different in approach from all of the aforementioned ones. We now give a high-level overview. 



The Technique. Part of the difficulty in establishing our result is that many of the known tech- 
niques for proving communication complexity lower bounds seem unable to prove bounds better 
than Q( v / n). These include the classic rectangle-based methods of discrepancy and corruption0 
for reasons described below. They also include certain linear algebraic approaches, such as the 
factorization norms method of Linial and Shraibman BLS07H and the pattern matrix method of 
Sherstov [She08], because these methods lower bound quantum communication complexity. The 

1 We assume that the reader has some familiarity with these basic techniques in communication complexity, which 
are discussed in detail in the textbook of Kushilevitz and Nisan IKN97I . Some authors use terms like "one-sided 
discrepancy" and "rectangle bound" when describing the technique that we (following Beame et al. |BPSW06|) have 
termed "corruption." 
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trouble is that GHD does have a constant-error 0{^Jn logn) quantum communication protocol, as 
can be seen by combining a query complexity upper bound due to Nayak and Wu [NW99II with a 
communication-to-query reduction, as in Buhrman et al. RBCW98I or Razborov |Raz02L 

Instead, what does work is a suitable generalization of the corruption method. Recall that 
the standard corruption method proceeds as follows. First, one observes that every protocol 
that communicates c bits induces a partition of the communication matrix into 2 C disjoint near- 
monochromatic rectangles. In order to show a lower bound of c, one then needs to prove that any 
rectangle containing at least a lT c fraction of the 1-inputs must also contain (or be "corrupted" 
by) a not-much-smaller fraction of the 0-inputs (or vice versa). In other words, one shows that 
large near-monochromatic rectangles do not exist, from which the desired lower bound follows. It 
should be noted that proving such a property could be a challenging task. Indeed, this is the main 
technical contribution of Razborov's proof of the Q(n) lower bound on the randomized commu- 
nication complexity of the disjointness problem IIRaz90l| . 

This idea appears not to give a lower bound better than Cl(^fn) on the randomized communi- 
cation complexity of GHD because its communication matrix does contain "annoying" rectangles 
that are both large and near-monochromatic. This can be seen, e.g., by considering all inputs (x, y) 
with Xj = 0, y, = 1 for i G {1,2, ... , 100^/n}: the resulting rectangle contains a 2~®(v / "' fraction of 
all 1-inputs (it is large), but a much smaller fraction of 0-inputs (it is nearly monochromatic). 

Our generalization considers not just 0-inputs and 1-inputs, but also a carefully selected set 
of "joker" inputs, whose corresponding outputs are immaterial. Loosely speaking, we show that 
if a large rectangle contains many more 1-inputs than 0-inputs, then the fraction of joker inputs 
it contains must be even larger than the fraction of 1-inputs it contains (by some constant factor, 
say 3/2). This property — call it the "joker property" — implies that even though annoying 
rectangles exist, their union cannot contain more than a constant fraction of the 1-inputs (say, 2/3). 
In particular, there is no way to partition the 1-inputs into 2 C near-monochromatic rectangles, and 
a lower bound of c follows. 

This simple-sounding idea seems to have considerable power. Indeed, the method we have 
presented above can be seen as a special case of the ideas behind the "smooth rectangle bound" 
recently introduced by Klauck MKlalOB and systematized by Jain and Klauck |JK10[ . Formally, 
when we prove a communication lower bound using corruption-with-jokers as above, we are 
essentially lower bounding the smooth rectangle bound of the underlying function. For a careful 
understanding of this matter, based on linear programming duality, we refer the reader to Jain 
and Klauck |JK10| . 

Of course, there remains the task of proving the joker property referred to above. It turns out 
that the statement we need boils down to roughly the following: for arbitrary sets A,B Q {0, 1}" 
that are not too small (say, of size at least 2°" n ), if x Gr A and y Gr B, then A(x,y) is not too 
concentrated around n/2; a precise statement appears as Corollary 13.81 The proof uses a Gaussian 
noise correlation inequality (Theorem 13.51 proved using analytic methods); this inequality and its 
proof are the main technical contributions of the paper and should be of independent interest. 

Data Stream and Other Consequences. The original motivation for studying GHD was a spe- 
cific application to the Distinct Elements problem on data streams. Specifically, given a stream 
(sequence) of m elements, each from [n] := {1,2, ... ,n}, we wish to estimate, to within a 1 ± e 
factor, the number of distinct elements in it, while using space sublinear in m and n. A long line 
of research has culminated in a randomized algorithm IIKNW10II that computes such an estimate 
(failing with probability at most \, say) in one pass over the stream, using 0(e~ 2 + log(mn)) bits 
of space. A space lower bound of H(log n) has been known for a while RAMS99II and is easily seen 
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to apply to multi-pass algorithms. But the dependence of the lower bound on £ is a longer story. 

An easy reduction (implicit in Indyk and Woodruff [IW03J) shows that a lower bound of 
Cl(<p(n,k)) on the maximum message length of a (2k — l)-round protocol for GHD would imply a 
Q (cp(s~ 2 , k) ) space lower bound on fc-pass algorithms for the Distinct Elements problem. Thus, the 
one-way O(n) lower bound for GHD implied a tight Q(e~ 2 ) lower bound for one-pass streaming 
algorithms. The results of Brody and Chakrabarti |BC09M and Brody et al. MBCR + 10M extended this 
to p-pass algorithms, giving lower bounds of 0(e _2 /2°^ 2 )) and 0(e~ 2 / (p 2 log p)), respectively. 

Our main result improves this pass/ space tradeoff, giving a space lower bound of d(e~ 2 /p). 
As is easy to see, this is tight up to factors logarithmic in m and n. Further, since the commu- 
nication lower bound for GHD can be shown to hold under a uniform input distribution, this 
space lower bound can be shown to hold even for rather benign models of random uncorrelated 
data HWoo09ll . 

Suitable reductions from GHD imply similar space lower bounds for several other data stream 
problems, such as estimating frequency moments MWoo04l and empirical entropy IICCMlOi One 
can also derive appropriate lower bounds for a certain class of distributed computing problems 
known as functional monitoring [ABC09L We note that the second frequency moment (equiva- 
lently, the Euclidean norm) can be interpreted as the self-join size of a table in a database, and is 
an especially important primitive needed in many numerical streaming tasks such as regression 
and low-rank approximation. 



Subsequent Developments. Since the preliminary announcement of our results IICR11L there 
has been much additional research related to GHD. One line of research has provided alternative 
proofs of our main result. Vidick IVidlll gave a proof that followed the same overall outline 
as ours, but had an alternative proof of the joker property, based on matrix-analytic and second 
moment methods. More recently, Sherstov |Shel2] gave a proof that changed the outline itself, 
working with a closely related problem called GAP-ORTHOGONALITY that has the advantage of 
being amenable to the basic corruption method. Further, by using an inequality due to Talagrand, 
Sherstov was able to work with the discrete problem directly rather than passing to Gaussian 
space. 

Other lines of research have applied the optimal Cl(n) bound on the communicatio n complex - 
ity of GHD to obtain results on a diverse array of topics, including differential privacy ||MMP + 10M , 
distributed functional monitoring IIWZ12L property testing 1 1313 VI 11 1, and data aggregation in net- 
works IIKOllL Furthermore, Woodruff and Zhang [WZ12I1 have given a new proof of optimal 
multi-pass space lower bounds for Distinct Elements without appealing to our lower bound for 
GHD. 



2 Corruption, a Generalization, and the Main Theorem 
2.1 Preliminaries 

Consider a communication problem given by a (possibly partial) function /:XxY-> {0, 1, *}; 
we let / take the value "*" at inputs for which we do not care about the output given. For a 
communication protocol, P, involving two players, Alice and Bob, we write P(x,y) to denote the 
output of P when Alice receives x G X and Bob receives y £ Y. If P is randomized, this is a 
random variable. We say that P computes / with error at most £ if 

V(x,y) GXxY : f(x,y)£*=>Pr[P(x,y)tf(x,y)]<e. 
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When the function / is understood from the context, we use err(P) to denote inf{e : P computes 
/ with error at most e}. For a deterministic protocol P and a distribution u on X x Y, we define 

err,,(P) := Pr \f(x,y)£* A P(x,y) £ f{x,y)] . 

For a protocol P, let cost(P) denote the worst-case number of bits communicated by P. We 
let R £ (/) and D« /£ (/) denote the e-error randomized and e-error ^/-distributional communication 
complexities of /, respectively; i.e., 

Re(/) = min{cost(P) : P is a randomized protocol for / with err (P) < e} ; 
D;/,£(/) = min{cost(P) : P is a deterministic protocol for / with err«(P) < e} . 

We also put R(/) = R 1/3 (/) and D,,(/) = D }a/3 (f). 

2.2 Rectangles and Corruption 

Consider a two-player communication problem given by a function / : X x Y — > Z. A set R C 
X x Y is said to be a rectangle if R = X x y for some X C X and y C Y. A fundamental property 
of communication protocols is the following. 

Fact 2.1 (Rectangle property; see, e.g., IKN97I ). Let P be a deterministic communication protocol that 
takes inputs in X x Y, produces an output in Z, and communicates c bits. Then, for all z G Z, there exist 
2 C painvise disjoint rectangles R\ iZ , . . . , R2 c ,z such that 

V (x,y) G X x Y : P(x,y) = z ^ (x,y) G UHi Ri,z ■ 

The rectangles R\ /Z , . . . , Ri\z we called the z-rectangles of P. 

Let us focus on problems with Boolean output, i.e., Z = {0,1}. The discrepancy method for 
proving lower bounds on R(/) consists of choosing a suitable distribution u on X x Y and showing 
that for every rectangle R, the quantity \u(RT\ / _1 (0)) — u(R n ^ s "exponentially" small. 

For several functions, this method is unable to prove a strong enough lower bound; the canonical 
example is DISJ. A generalization that handles DISJ, and several other functions, is the corruption 
method MRaz90llKla03UBPSW06B which consists of showing, instead, that for every "large" rectangle 
R, we have x]i\{R) < uo(R), for a constant oc > 0, where Uj is a probability distribution on R n 
for i G {0,1}. Intuitively, we are arguing that any large rectangle that contains many Is 
must be corrupted by the presence of many 0s. The largeness of R is often enforced indirectly by 
writing the inequality in the following manner, where m typically grows with |X| and | Y|: 

3 oc , «! > V R rectangular : (^(R) < a u (R) + 2~ m . (1) 

An inequality of this form allows us to conclude an Q(m) lower bound on D V £ (/) for a suitable 
distribution v and sufficiently small error e > 0. (Rather than present a full proof, we note that 
this follows as a special case of Theorem 12.21 below.) By the easy direction of Yao's lemma, this 
implies R £ (/) = Ci(m). 

2.3 Corruption With Jokers, and the Smooth Rectangle Bound 

We now introduce a suitable generalization of the corruption method, which, as we shall soon see, 
implies that D^ /£ (ghd) = Q(n), for suitable u and e. The corresponding technical challenge is met 



5 



using a new Gaussian noise correlation inequality that we prove in Section |3l Our generalization 
can be captured within the very recent smooth rectangle bound framework ]Klal0l[jK10| . However, 
we believe that there is merit in singling out the method we use, because it appears wieldier than 
the smooth rectangle bound, which is more technically involved. 

The key idea is that, in addition to the distributions jiQ and U\ on the 0-inputs and 1-inputs to 
f, we consider an auxiliary distribution u + on "joker" inputs. Strictly speaking, we just have a 
"joker distribution" and it does not matter how u + relates to u$ and U\, but it is crucial that 
the inequality below gives a negative weight to u + , and is therefore a weakening of Q}. 

fl^K) < <W)(R)+2- m . (2) 

We shall in fact allow a little flexibility in our choice of Uq and U\ by requiring only that these 
be supported "mostly" on 0-inputs and 1-inputs. Also, we shall extend our theory to partial 
functions, since GHD is one. The next theorem captures our lower bound technique. 

Theorem 2.2. For all olq, dl\, ol + , e > such that e < {ot\ — oc + ) / («o + there exist /3 G R and s! > 
such that the following holds. Let f : XxY-> {0,1,*} be a partial function. Let Aq = / _1 (0) and 
A\ = Suppose that there exist distributions Uo, U\, u + on X x Y, and a real number m > such 

that 

(1) for i G {0, 1}, U{ is mostly supported on A\, i.e., ^i(Aj) > 1 — e, and 

(2) inequality $2$ holds for all rectangles R C X x Y. 

Then, for the distribution v := {ocqUq + / '(do + ^ have D VA i{f) > m + f>. In particular, we 
haveR^(f) >m + ft. 

Proof. Consider a deterministic protocol P that computes / with some error e' (to be fixed later) 
under v, and uses c bits of communication. Let R\, . . . , Rjc C X x Y be the disjoint 1-rectangles of 
P, as given by Fact 12. II Let Si = \jf =1 R, and So = Xx Y\Si. Notice that S, is exactly the set of 
inputs on which P outputs i. Thus, for i G {0, 1}, we have 

err F; (P) = #(S f n A 1 ^ i ) + ^(Si_ f n A') 

> m{S^i)-e, (3) 

where the last step uses Condition (1). 

Instantiating inequality ([2]) with each R, and summing the resulting inequalities, we get 

a^i (SO (SO < K no{Si)+2 c -2-' n . (4) 

Noting that ^i(Si) = 1 — /^i(So) and applying (0 to the Uo and U\ terms in ((U), we obtain 

ai(l-err w (P) -£)-«+?+ (SO < «o(err w (P) + e) + 2 c ~ m . 

Further, noting that u + (Si) < 1, and rearranging terms, we obtain 

«!-«+ < (a + aOe+(«o-err ?/0 (P) + ai-err ?il (P))+2 c - m 
= (uo + a 1 )e+(a Q + tt 1 )err v {P)+2 c - m . 

2 In the sequel, when we apply the technique to GHD, ]i\ and p + will be sharply concentrated on pairwise 
disjoint sets of inputs, which we can think of as the interesting 0-inputs, the interesting 1-inputs, and the joker inputs, 
respectively. 
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Using errv(P) < e' and rearranging further, we get 

2 c-m > ai _ a+ _( ao + ai )( £ + e /)_ 

By virtue of the upper bound on e, we may choose e' small enough to make the right-hand side of 
the above inequality positive, and equal to 1? , say. Doing so gives us c > m + ft, as desired. 

Notice that the "hard distribution" v is explicitly specified, once the distributions involved in 
Condition (2) are made explicit. □ 

We could, alternately, have proved Theorem 12.21 by demonstrating that the given conditions 
imply that the smooth rectangle bound of / is O(m). We have chosen to give the above proof 
instead, because it is more elementary, avoiding the technical details of the latter bound, and 
because it was discovered independently by the first named author. 



2.4 Application to GHD: the Main Theorem 

The Gap-Hamming-Distance problem is formalized as the computation of the partial function 
GHD„ ;t/g : {0,1}" x {0,1}" -> {0,1,*} defined as follows. 

if A(x,y) < t-g, 
GHD„ /f , g (x,y) = <(l, if A(x,y) > t + g, 

otherwise. 

It will be useful to have some flexibility in the choice of the location of the threshold, t, and the 
size of the gap, g. It is not hard to see that all settings with t G O(n) n in — O(n)) and g = &(y/n) 
lead to "equally hard" problems, asymptotically; we prove this formally in Lemma l4~2l 

Rather than working with GHD n n / 2 ^ directly, it proves convenient to consider the partial 
function /j, = GHD fl n /2-bJn^/2n' ^ or some large enough constant b to be determined later. We 
shall now come up with distributions and constants that satisfy the conditions of Theorem 12.21 
Condition (1) turns out to be easy to verify, and verifying Condition (2), as mentioned above, is a 
significant technical challenge that we deal with in Section [3] 

Definition 2.3. For p G [-1, 1], let £ p denote the distribution of (x,y) G {0, 1}" x {0, 1}" defined 
by the following randomized procedure: pick x Gr {0, 1}" uniformly at random, and then pick y 
by independently flipping each bit of x with probability (1 — p)/2. Notice that £o is the uniform 
distribution on {0, 1}" x {0, 1}". 

We shall need the following two lemmas. The first of these follows easily from standard tail 
estimates for the binomial distribution, or even just the Chebyshev bound; we omit its proof. The 
second is formally proved at the end of Section [3] 

Lemma 2.4. For all e > there exists b > such that, for large enough n, we have 
^b/M A ^ = , ?\ [A(x,y) < ^-(b + v 7 ^)^ 



&(A0 = Pr [A(x,y)>£-(fe-V2 



> 1 — e , and 
> 1-e, 

where A = fr 1 (0) and A 1 = fr 1 (1). □ 
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Lemma 2.5. For allb > there exists 5 > such that, for large enough n, we have 

VRC { 0/ l}" x{0,ir rectangular: \ (^ 46 /^ (*) + U,^)) > |?oW-2^". 

To derive the lower bound on R(ghd), we put m = 5n, u$ = £ 4i) / U\ = £o, u + = 

£ = g/ oil = §/ an d «o = a + = \- Note that this choice of constants satisfies e < {ol\ — «+)/ («o + 
ai). By Lemmas 12.41 and 12.51 we see that Conditions (1) and (2), respectively, of Theorem 12.21 are 
met; the inequality in Lemma 1231 is easily seen to be the corresponding instantiation of ((2]). 

Thus, applying Theorem 12.21 we conclude that there exist absolute constants e',S,b > and 
/3 £ R such that, for large enough n, we have R e '(/b) > &n + /3. Combining this with Lemma l4~2l 
(proved in Section H) to adjust for the slightly off-center threshold and the size of the gap, and 
applying standard error reduction techniques, we obtain the following asymptotically optimal 
lower bound for GHD. 

Theorem 2.6 (Main Theorem). R(GHD„ n/2 ^) = O(n). □ 

In applications of a communication lower bound, it is often helpful to have a good understand- 
ing of the "hard input distribution" that achieves the lower bound. One slightly unsatisfactory 
aspect of our proof above is that the hard distribution for GHD that it implies is not too clean. With 
a little additional work, however, we can show that the uniform input distribution is hard for GHD, 
once we require a small enough error bound. This is stated in the following theorem, whose proof 
appears in Section|U 

Theorem 2.7 (Hardness Under Uniform Distribution). There exists an absolute constant e > for 
which D ?0/£ (ghd M/M/2 ^) = n(n). 

3 An Inequality on Correlation under Gaussian Noise 

We now turn to the proof of Lemma 12.51 for which we need some technical machinery that we 
now develop. We begin with some preliminaries. 

Some Probability Distributions. Let u denote the uniform (Haar) distribution on S" -1 , the 
unit sphere in R". Let 7 denote the standard Gaussian distribution on R, with density function 
(27r) _1 ^ 2 e _x2//2 , and let 7" denote the n-dimensional standard Gaussian distribution with density 
(2ttt) «/2 e ||j*:|| 2 /2_ p or a ge j. ^ q wnen we write, e.g., 7" (A), we tacitly assume that A is mea- 
surable. For a set A C R n we denote by 7 >! \a the distribution 7" conditioned on being in A. We 
say that a pair (x, y) is an n-correlated Gaussian pair if its distribution is that obtained by choosing 
x from 7" and then setting y = nx + yl — n 2 z where z is an independent sample from 7". It 
is easy to verify that if (x,y) is an //-correlated Gaussian pair, then so is (y, x); in particular, y is 
distributed as 7". 

Relative Entropy. We recall some basic information theory for continuous probability distribu- 
tions. For clarity, we eschew a fully rigorous treatment — which would introduce a considerable 
amount of extra complexity through its formalism — and instead refer the interested reader to 
the textbook of Gray [Gra90J. Given two probability distributions P and Q, we define the relative 
entropy of P with respect to Q as 

D(P\\Q) = f P(x)\n(P(x)/Q(x))dx. 
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It is well known (and not difficult to show) that the relative entropy is always nonnegative and is 
zero iff the two distributions are essentially equal. We will also need Pinsker's inequality, which 
says that the statistical distance between two distributions P and Q is at most \/D{P || Q) /2 (see, 
e.g., BGra90l Lemma 5.2.8]). Since we will only consider the relative entropy with respect to the 
Gaussian distribution, we introduce the notation 

D 7 (X) := D(P\\j) 

where X is a real-valued random variable with distribution P. We define Dyi similarly. These 
quantities can be thought of as measuring the "distance from Gaussianity." They can be seen, 
in some precise sense, as additive inverses of entropy, and as such satisfy many of the familiar 
properties of entropy. For instance, it is easy to verify that for any sequence of random variables 
Xi,...,X n we have the chain rule 

n 

D 7 n(Xi, . . .,X n ) = Y_j D 7 (X] c \Xi, . . . ,X;t_i) , 

where, for random variables X and Y, we use the notation D 7 (X| Y) to denote the expectation over 
Y of the distance from Gaussianity of X|Y. 

3.1 Projections of Sets in Gaussian Space 

Our main technical result is a statement about the projections of sets in Gaussian space. More 
precisely, let A C R" be any set of not too small measure, say, 7" (A) > exp(—3n) for some 
constant 3 > 0. What can we say about the projections (or one-dimensional marginals) of 7"|a/ 
i.e., the set of distributions of (7" \a,}/) as the (fixed) vector y ranges over the unit sphere S n_1 ? 

Related questions have appeared in the literature. The first is in work by Sudakov MSud78|l 
and Diaconis and Freedman MDF84B (see also IBob03| for a more recent exposition) who showed 
that for any random variable in R" with zero mean and identity covariance matrix whose norm is 
concentrated around ^Jn, almost all its projections are close to the standard normal distribution. 
A second related result is by Klartag |Kla07ll who, building on the previous result but with con- 
siderable additional work, showed that almost all projections of the uniform distribution over a 
(properly normalized) convex body are close to the standard normal distribution. (For the special 
case of the cube [—1, l] n , this essentially follows from the central limit theorem.) 

Our setting is different as we do not put any restrictions on the set A (such as convexity) apart 
from its measure not being too small (and clearly without any requirement on the measure one 
cannot say anything about its projections). Another important difference is that in our setting the 
projections are not necessarily normal. To see why, take A = {x : |xi | > i} for t m \/~5n, a set with 
Gaussian measure roughly exp(— 5n), half of which is on vectors with X\ & t and the other half on 
vectors with X\ « — t. It follows that the projection of 7" | a on a unit vector y is distributed more 
or less like the mixture of two normal variables, one centered around t\j\ and the other centered 
around —ty\, both with variance 1. For unit vectors y with \y\ \ > 1/ \/~5n (a set of measure about 
exp( — 1/^)), this distribution is very far from any normal distribution. 

Our main theorem below shows that the general situation is similar: for any set A of not too 
small measure, almost all projections of 7"^ are close to being mixtures of translated normal 
variables of variance 1. One implication of this (which is essentially all we will use later) is that 
for any A C R" of not too small measure, and B C S"" 1 whose measure is also not too small, the 
inner product {x, y) for x chosen from 7" | a arid y chosen uniformly from B is not too concentrated 
around 0; in fact, it must be at least as "spread out" as 7 (and possibly much more). 
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Theorem 3.1. For all e, 5 > and large enough n, the following holds. Let A C R" be such that 
7" (A) > e~ £ Then, for all but an e~ Sn/36 measure of unit vectors y G S n_1 , the distribution of {x,y) 
where x ~ 7"|a is equal to the distribution of aX + Y for some 1 — 5 < ct < 1 and random variables X 
and Y satisfying 

D 7 (X|Y) < e. 

The proof is based on the following two lemmas. The first one below shows that for any set 
A whose measure is not too small, and any orthonormal basis, most of the projections of 7" |a on 
the basis vectors are close to normal. In fact, the statement is somewhat stronger, as it allows us to 
condition on previous projections (and this will be crucially used). 

Lemma 3.2. For all e > and large enough n the following holds. For all sets A C W with 7" (A) > 
e~ £ " and all orthonormal bases y\, . . . ,y n , at least a 1 — e fraction of the indices k G [n] satisfy 

D 7 (Pfc|Pi / ...,Pjt-i) < e, 

where P = (u,y,), with u ~ 7 >! |a- 

Proof. By definition, D 7 " (7"^) = — ln7 >! (A) < e 2 n. Thus, since (Pi, P n ) is the vector u written 
in the orthonormal basis y\, . . . , y n , using the chain rule for relative entropy, we have 

e 2 n > D r ( 7 n \ A ) = D r (P x ,...,P n ) = £ D 7 (P fc |P 1/ . . . ,P k _ x ) . 

Hence, for at least a 1 — £ fraction of indices k, we have D 7 (pt|Pi, . . . , Pt-i) < e. □ 

The second lemma is due to Raz IRaz991 and shows that any not-too-small subset B of the 
sphere contains n/2 "nearly orthogonal" vectors. The idea of Raz's proof is the following. First, 
a simple averaging argument shows that there is a not-too-small measure of vectors y' G S n_1 
satisfying the property that the measure of B inside the unit sphere formed by the intersection of 
S" ~ 1 and the subspace orthogonal to y' is not much smaller than u ( B ) . Second, by the isoperimetric 
inequality, almost all vectors in S n_1 are within distance 5 of B. Together, we obtain a vector y' as 
above that is within distance 5 of B. We take y n /2 to be the closest vector in B to y' and repeat the 
argument recursively with the intersection of B and the subspace orthogonal to y' . 

Definition 3.3. A sequence of unit vectors y\, . . . ,y^ G S" _1 is called 5-orthogonal if for all i G [k], 
the squared norm of the projection of y, on span(i/i, . . . , y t '-l) is a * most 5. 

Lemma 3.4 ( HRaz99l Lemma 4.41). For all 5 > and large enough n, the following holds. Every B C S" _1 
ofHaar measure u(B) > e~ Sn/36 contains a 5-orthogonal sequence y\, . . . ,y n /i G B. 

Proof of Theorem [O] Let B C S" _1 be an arbitrary set of unit vectors of measure at least e _<5 " / ' 36 . 
We will prove the theorem by showing that at least one vector y G B satisfies the condition stated 
in the theorem. 

By Lemma 13.41 there is a sequence of n/2 vectors y\, . . . ,y n /i G B that is ^-orthogonal. Let 
V\ 1 ■ ■ ■ >y*n/2 ^ e their Gram-Schmidt orthogonalization, i.e., each y| is defined to be the projection 
of y^ on the space orthogonal to span(yi, . . . , i/fc— 1 ) - Notice that, by definition, we can write each 

y fc as 



10 



for some real coefficients Moreover, by assumption, ||y| || 2 > 1 — 5. 

Let Pi, ... , P„/2 be the random variables representing (x,y\/\\y\\\) , . . . , (x,y^ /2 /\\yl/2\\ ) when 
x is chosen from j"\a- By applying Lemma I3T21 to any completion of y\l \\y* ||, . . . , y* /2 / ||y* /2 1| to 
an orthonormal basis, we see that there exists an index k E [n/2] for which 

D 7 (P fc |P 1/ ... / P jt _ 1 ) < e. 

(In fact, at least 1 — 2e of the indices k satisfy this.) It remains to notice that we can write {x,y\) as 

|jy,:i!^+E*Mllyni^ 

! = 1 

which satisfies the condition in the theorem, with X taken to be P]< and Y taken to be the above 
sum. Here we are using the fact that Y is a function of Pi, ... , Pjt-i, which implies that D 7 (X| Y) < 
D 7 (Pjt|Pi, . . . , P)t-i) since conditioning cannot decrease relative entropy. □ 

3.2 The Correlation Inequality 

We now turn to our main technical result, which is given by the following theorem. 

Theorem 3.5. For all c, e > there exists a 5 > such that for all large enough n and < n < c/ a/h the 
following holds. For all sets A,B <Z R" with 7" (A), j n (B) > e~ Sn we have that 

\( Pr [x E A Ay E B] + Pr [x E A Ay E B]\ > (1 - e)j n (A)j n (B) . 

2 \ [x,y ) is t] -correlated { x iV) ' s — '1 -correlated J 

As will become evident in the proof, pairs (x, y) E A x B for which |(x,y)| is small con- 
tribute much less to the left hand side than to the right hand side. Hence the theorem essentially 
amounts to showing that (x,y) is not too concentrated around zero, and precisely such an anti- 
concentration statement is given by Theorem l3.ll 

We point out the following easy corollary (which is in fact equivalent to Theorem l3.5|) , 

Corollary 3.6. For all c, e > there exists a 5 > such that for all large enough n and < n < c/ \fn the 
following holds. For any sets A, B C R" with 7" (A), 7" (£>) > e~ 5n where A (or B) is centrally symmetric 
(i.e., A = —A) we have that 

Pr [x E A Ay E B] > (1 - e)j n {A)j n (B) . 

(x,y) is rj-correlated 

Remark. Without the symmetry assumption, this probability can be considerably smaller. For 
instance, take A and B to be two opposing half-spaces, i.e., A = {x : x\ < — t} and B = {x : x\ > 
t} for t « \/lm. Then for n = c/ \/n, the probability above can be seen to be e- @i ^j n (A)j n (B). 
In fact, C. Borell [Bor85J showed that for any given 7" (A), 7" (£>) and any < n < 1, two opposing 
half-spaces A, B of the corresponding measures exactly achieve the minimum of the probability 
above. It would be interesting to obtain a strengthening of Corollary 13.61 of a similar tight nature. 
See ||Bar01ll for a short related discussion. 

Recall that cosh(x) := \ (e x + e~ x ) . The following technical claim shows that if the distribution 
of x is close to the normal distribution (in relative entropy) then the expectation of cosh(ax + z) is 
at least e a2//2 — e. Notice that if x is normal, this expectation is 

E x ^ 7 [cosh(ax + z) ] = cosh(z) E x ^ 7 [cosh(ax) ] = cosh(z) e ft2/2 > e a2/1 , 

where in the first equality we used the symmetry of 7, and the second follows from an easy direct 
calculation of the integral (just complete the square in the exponent). 
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Claim 3.7. For all e,ao > there exists a 5 > such that for any probability distribution P on the reals 
satisfying D 7 (P) < 5, any z£R, and any < cc < oco, we have 

E x „p[cosh(ax + z)] > e a2/2 - e. 

Proof. Set M = ~E X ~ 7 [ (1 + cosh(2a x) ) /e ] so that for all z and all oc < a , 

1 

E 1 -^ 7 [min(cosh(ax + z),2M) ] = - E x ^ 7 [min(cosh(ax + z), 2M) + min(cosh(ax — z),2M) ] 

> E T ^ 7 min ^(cosh(ax + z) + cosh(ax — z)),M^j 

= E x ^ 7 [min(cosh(z) cosh(ax),M) ] 

> E x ^ 7 [min(cosh(ax),M) ] 

\ 

> E x ^ 7 [cosh(ax) ] — — E 1 -^ 7 [cosh(ax) 2 ] 

= e" 2/2 -^E.^ ^(l+cosh(2 a x)) 
>e a2/2 -e/2, 

where in the third inequality we use the fact that min(u, v) >u — u 2 / v for all u, v > 0. Next, since 
the statistical distance between P and 7 is at most A /2D 7 (P) < VlS, we have that 

E x „ P [cosh(ax + z)] > E^ P [min(cosh(ax + z),2M) ] > e* 2/2 - e/2 - 2M\/27 > e aV2 - e 
for small enough ^ > 0. □ 

Proof of Theorem 1331 Let f>\, f>i, {*>■$, f>i > be small enough constants (depending only on c and e) 
to be determined later. By choosing a small enough 5, and using the concentration of the Gaussian 
measure around the sphere of radius \fn (see, e.g., |Bal97l Lecture 8]), we can guarantee that A', 
defined as 

A' = {xeA: (l-ft)n < ||x|| 2 < (l + ftjn}, 
satisfies y"{A') > j n (A) - f> 2 &- &n > (1 - /3 2 )7 n (A) and similarly for B'. We can write 

Pr [x G A A y G B] 

(x,y) is ^-correlated 

> Pr [x € A' A y E B'] 

{x,y) is (/-correlated 

= (2n)- n/2 (2Ti{l -tj 2 ))-"' 2 J l A K^)lB'(y)e" l|x||2/2 e-H^'' x ll 2/2 ( 1 -^dxdy 

= (l- v 2 )- n/2 E XAJ ^„ [l^(^)l B ,(y)e-'? 2 ll-ll 2/2 ( 1 -^)e-^llyll 2/2 ^-^)e J 7<^> / t 1 - J ? 2 ) ] 

= (l- 77 2 )-" /2 E^ 7 n U „^ T „| B , [ e-^ll'H^^-^e-^lly^^^-^e'^/C 1 -^) ] T W (A')7 W (B') 

> (i-^ 2 )-"/ 2 e-'? 2 ( 1+ ^)' !/ ( 1 -'? 2 )E.^ 7 „ u , / ^ 7 „| B , [e'^/^lyCAOVCB'). 

By averaging this inequality with the analogous one for — ?/ and recalling the definition of cosh, 
we obtain that the expression we wish to bound is at least 

(1 _ ^/2 e -^(i+fr)»/(i-,») E _ 7 „ u ,^ 7 „| b , [coah07<*,y>/(l - n 2 )) ] 7 »(A>)y n (B') . (5) 
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Let B" C B' be the set of all y G B' for which 

E^ 7 „ u ,[cosh(^(x /y )/(l-^))] < (l - fr) e <9'(MrW-/»0»/2 . 

We can now complete the proof by showing that j n (B") < 547" (B'), since this would imply that 
(0 is at least 

(1 _ v 2yn/2 e - V Hl+Mn/(l-V 2 )^ _ _ ^) e ( 9 /(l-»/ 2 )) 2 (l-ft)n/2 ? « ( ^y* (g/) 

> e V/2 e - ) 7 2 (l+^)n/(l-^)( 1 _ ^(j _ ^) e (n/{l-f)ni-h)n/l (1 _ 6 2 )V ( A) 7 " (B) 

>(1- £ )7"(A) 7 "(B), 

assuming B\, Bi, 63 and 64 are chosen to be sufficiently small and n is large enough. 

In order to complete the proof assume, to the contrary that j"(B") > B^ n (B') > 64(1 — 
Bi)e~ . Let 65, Be, By > be small enough constants to be determined later. Let — B\)n < 
r < + B\)n be such that the Haar measure }i{{rS n ~ l n B")/r) of points in B" of norm r is 
at least 7 ,! (B"). (The existence of such an r follows from the fact that the Gaussian distribution, 
being spherically symmetric, can be seen as the product of a certain distribution on radii r and the 
Haar measure on the sphere. Since B" C B', the r maximizing the intersection with the sphere 
must be in the claimed range.) We now apply Theorem 13.11 with e taken to be B5, 5 taken to be 
B(,, and A taken to be A'. By taking (our) 5 to be small enough, we obtain a vector y G B" for 
which the distribution of (x, y) where x ~ 7" \^ is given by the distribution of otrX + rY for some 
1 — B(, < a. < 1 and random variables X and Y satisfying 

D 7 (X|Y) < 65. 

In particular, we have 

Pr[D 7 (X|Y)< V^] > 

Claim I3T71 now implies that 

E x ^n ]A ,[cosh(rj(x,y)/(l-ri z ))} = E[cosh(n/(l-n 2 )( a rX + rY))] 

> {l-^f 5 ){e^ /{1 -^ 2,1 -B 7 ) 

> (1 - v ^)( e ('? / ( 1 - i ? 2 )) 2 ( 1 -^) 2 ( 1 -^)" /2 - 67) 

> (l- J 8 3 ) e ('? / ( 1 -^) 2 ( 1 -' 8l ) n/2 7 

assuming 65, 66 and 67 are sufficiently small, in contradiction to the assumption that y G B". □ 
3.3 Corollary for the Boolean cube 

The Gaussian noise correlation inequality we have just proved implies a similar statement for the 
Boolean cube, from which Lemma 12.51 follows easily. The statement involves the distribution 
from Definition 12.31 

Corollary 3.8 (Stronger variant of Lemma l23)l . For all c, £ > there exists a 5 > such that for all large 
enough n and < p < c/yfn the following holds. For all sets A,B C {0, 1}" with \A\, \B\ > 2^^" , we 
have that 

\ (£_ p (A x B) + £ p (A x B)) > (1 - e) £ (A x B) . 
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To derive Lemma take R = A x B, e = \, and observe that if min{|A|, \B\} < 2^^ n , 
then ^o(R) < 2~ Sn and the inequality in that lemma holds trivially, because its right-hand side is 
negative. 

A short calculation shows that the inequality in Corollary |3.8| is equivalent to 

(1 - p 2 ) n/1 E xeA/y6B [ cosh (in (1±£) • (A(x,y) - n/2) 



> 1-e. 

Hence the corollary can be interpreted as an anti-concentration statement, saying that for sets A, B 
that are not too small, the Hamming distance A(x,y) between x Gr A and y Gr B cannot be too 
concentrated around n/2. The quantification is delicate. Notice that already for sets of size 2" / ' 2 
this is no longer the case: take, for instance, the sets A = {0 n ^ 2 x : x G {0, l}"/ 2 and |x| = n/4} 
and B = {xO n/2 : x G {0,1}" /2 and |x| = n/4}. 

Proof. Given any A, B C {0, 1}", define 

A' = {x G R" : sign(x) G A} 

where sign(x) G {0, 1}" is the vector indicating the sign of each coordinate of x, and define B' sim- 
ilarly. Then it is easy to check that 7" (A') = \A\/2 n and7"(B') = \B\/2 n , so that j n (A')j n (B f ) = 
fo(A x B), and that for all rj, 

Pr [x G A' Ay G B'] = £ p (A x B) 

is ^-correlated 

for p = 1 — -| arccos tj (since the probability that sign(x) 7^ sign(y) when x,y G R are n-correlated 
can be computed to be ^ arccos 77). For small /?, we get p ~ ^w, and the corollary follows from 
Theorem l3.5l □ 

4 Reductions, Related Results and Generalizations 

Recall that our argument in Section [Z4l gave an Q(n) lower bound on R(GHD f! n / 2 -b^/n \/2n)' ^ or 
a certain constant b. To obtain an Q(n) bound for GHD itself (which, we remind the reader, is 
shorthand for GHD n n / 2 c), we use a toolkit of simple reductions, given in the next lemma. Fur- 
thermore, using the toolkit, we can generalize the GHD bound to cover most parameter settings, 
and using similarly simple reductions, we can obtain optimal lower bounds for related problems. 

Lemma 4.1. For all integers n, k, £, m and reals t, g, g' G [0, n], with n, k > and g' > g, the following 
relations hold. 

(1) R(GHD„ ;t;g < R(GHD M ,f, g ). 

(2) R(GHD n , f , g ) < R(GHD fcn , fc i /fcg ). 

(3) R(GHD„,f /g ) < R(GHD n+£+mit+£ig ). 

(4) R(GHD„, t/g ) = R(GHD n/n _ f/g ). 

Proof. We give brief sketches of the proofs of these statements. 
(1) A correct protocol for GHD n/t/g is also one for GHD„ /t y. 
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(2) We can solve GHD„ ;f/S by having Alice and Bob "repeat" their n-bit input strings k times each 
— which has the effect of also amplifying the gap by a factor of k — and then simulating a 
protocol for GHD^k. 

(3) We can solve GHD,,^ by having Alice pad her input by appending the string (+m to it, Bob 
pad his by appending l e m to it, and then simulating a protocol for GHD n+ / +B/ ( + ^. 

(4) Alice flips each bit of her input and the parties then simulate a protocol for GHD )3/)3 _ t/ g. □ 

As promised, using parts of the above lemma, we establish the following lemma, which for- 
mally completes the proof of the main theorem. 

Lemma 4.2. For all integers n > and reals b > 0, with n/2 > by/n, we have R(GHD f! „/ 2 -6-v/n s/in) — 

R ( GHD 2W2^)- 

Proof. Apply part (3) of Lemma |4~T1 with t = n/2 + by/n and m = n/2 — b\fn. □ 

The previous lemma can in fact be generalized, by invoking the remaining parts of Lemma I47TI 
to obtain a lower bound that handles all thresholds t that are not too close to either end of the 
interval [0, n\. We omit the details, which are routine, if somewhat tedious. 

Proposition 4.3. For all reals a G (0, 1] and b > 0, and all large enough integers n, the following holds. 
Let t,g be reals with t G [an, (1 — a)n] and g < bsjn. Then R(GHD n ,t / g) = Q(tt). □ 

The next result resolves the randomized complexity of GHD n ^„/2 / g for a general gap size, g. 

Proposition 4.4. For integers n and g, with 1 < g < n, we have R(GHD„ ; „/2 /g ) = ®(min{n, n 2 /g 2 }). 

Proof. For the upper bound, consider the protocol where Alice and Bob, on input (x,y) G {0, 1 }" x 
{0, 1}", use public randomness to select a subset S C [n] uniformly at random, from amongst all 
subsets of a certain size, k, compute d = \{i G S : X{ ^ y t }| by brute force (say, with Alice 
sending Bob the bits Xj for i G S), and output if d < k/2 and 1 if d > k/2. This protocol clearly 
communicates A: bits, and an easy application of the Chernoff bound shows that this gives a | -error 
protocol if we choose k = 0(n 2 /g 2 ). 

For the lower bound, we may assume that g > \/n, for otherwise the claim is obviously true. 
Applying part (2) of Lemma 14.11 with k = g 2 In (for simplicity, we ignore divisibility issues), we 
obtain R(GHD„2/^2 „2/2g2 I ,/g) < R(GHD„ , n /2,g)- The result follows by applying Theorem [2761 to the 
left-hand side of this inequality. □ 

4.1 Hardness Under Uniform Distribution 

We now turn to proving Theorem |2.7l which extends the O(n) lower bound for GHD to the specific 
input distribution £o, the uniform distribution on {0, 1}" x {0, 1}". 

Proof of Theorem 12771 For an integer n and real p G [—1, 1], let u, lr p denote the binomial distribution 
with parameters n and (1 — p) /2; notice that u n fl is the symmetric binomial distribution. Let P be 
a deterministic protocol for GHD 2n n ^ such that err f , 2ii0 (P) = 3. Our goal is to show that if 5 is 
small enough then cost(P) = Q(n). For d G {0, 1, . . . ,2n}, let 5^ be the error probability of P on 
uniform inputs at distance d, i.e., 

,n,\/2n 

{x,y) | A(x,y) = d\ . 
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Then, we have 

In 

S = }l2n,o(d)S d . 

d=Q 

Let Q be the following protocol for GHD n n /2-bJn V2~n- On m P u t { x >}))> Alice and Bob first 
pad their inputs as in Lemma 14.21 Then, using public randomness, they choose z Gr {0, l} 2 " 
and a random permutation a Gr S2„, and then each player adds z bitwise to their padded input 
and permutes the coordinates of the result according to a. Let x',y' G {0, l} 2 " be the parties' 
respective inputs after these transformations. Alice and Bob solve their problem by simulating 
P on input (x',y'). It is easy to see that (x',y') is uniformly distributed among all pairs with 
Hamming distance n/2 + by/n + A(x,y). 

Let v denote the hard distribution for GHD n „/ 2 -6-v/n V2n implied by our proof of Theorem 12.61 
To be explicit, we have v = f£ 4fo/v /^ + f £o- Let A := | }i nAh/ ^ + j]i n ,Q be the corresponding 
distribution of Hamming distances. It then follows that 

n 

err v (Q) = E K d ¥d+n/2+b^Ti ■ 

Suppose we are given a constant oc > 0. From standard properties of the binomial distribution, it 
follows that there exist reals c, K > (depending on a and b, but independent of n) such that 

n/2-Cy/n n 

£ A(rf)+ £ A(d) < a, 

<i=0 d=n/2+c v / )3 

and for integers d G [n/2 — Cy/n,n/2 + Ci/n], 

A(d) < Kw 2fi/ o(d + "/2 + &v/n) . 

It then follows that err v (Q) < a + KS. By picking a. sufficiently small, we obtain by our proof of 
Theorem 12.61 that, for small enough 8, cost(Q) = O(n). Since Q communicates exactly as many 
bits as P, it follows that cost(P) = O(n) . □ 

4.2 Related Communication Problems with a Gap 

We remark that results similar to those for GHD also hold for GAP-INTERSECTION-SIZE, where 
Alice and Bob have sets x, y C [n] as inputs and are required to distinguish between the cases 
\x Pi y\ < t — g and \x fl y| > t + g, for a threshold parameter £ and gap size g. Let this problem be 
denoted by GlS^^g. We then have the following result, by an easy reduction from GHD. 

Proposition 4.5. Suppose t G Q(n) n (n - O(n)) and g = 0( v / n). Then R(GlS f!/t/g ) = Q(n). □ 

Finally, we also remark that results similar to those for GHD also hold for the closely re- 
lated (in fact, essentially equivalent) problem GAP-INNER-PRODUCT. Here, Alice and Bob have 
d-dimensional unit vectors x, y as inputs and are trying to distinguish between the cases (x, y) > e 
and (x,y) < — e. There is a simple 0(l/e 2 ) protocol for this problem: the players use shared 
randomness to choose 0(l/e 2 ) random hyperp lanes and then compare which side of each hyper- 
plane their inputs lie in. Our main theorem implies that this is tight assuming d > 1 /e 2 , as can be 
seen by embedding the hypercube in the set {—1/ \fn,\/ \fn\ n . 
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